-
Notifications
You must be signed in to change notification settings - Fork 1.6k
🐛 Fix flakes in e2e tests for metrics #5138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @mayuka-c. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mayuka-c The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f130aac to
1c32309
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mayuka-c 👋
That looks great! However, this change only makes sense for projects that actually have webhooks.
Not all projects do — we use markers to inject code (see Kubebuilder markers
).
For example:
+kubebuilder:scaffold:e2e-webhooks-checks
Then, if we really need that we will need to see if we can use the same; it seems not. Then we need a new marker. So, we need to be 1000% sure that is the best way. Ideally we should avoid create new markers and etc since we need keep the support for those, maker harder for end user customisartions, etc.
So, if we need to check whether the webhook is serving before the metrics, we should only add this logic when webhooks are present — meaning we’ll need to use a marker for it.
Also, please make sure the scaffolded code remains as generic as possible, so it still works even if users customise their configurations.
Lastly, since this change affects the scaffolded e2e tests in projects (which impact end users), we need to be careful with the commit type.
We should use the 🐛 emoji instead of 🌱 — the seed emoji is only used for changes that don’t affect users and can be ignored in the release notes.
|
Hi @camilamacedo86
If I'm not wrong, I could add a new fragment to do the webhook readiness check here and replace with the marker here But the only problem that I see is, it will end up running other fragment checks too which is not required. Shall I go ahead to define a new marker? or if you have any better solution, please do suggest :) |
Sorry my bad, I will update the title with proper emoji. Thank you :) |
|
Hi @mayuka-c Let's create a new marker |
Sure thank you @camilamacedo86 . I will review the code and make sure it is generic. |
b703a44 to
d058ca9
Compare
|
Hi @camilamacedo86 👋 I have added a new marker Please let me know if this is fine or if I'm missing |
| g.Expect(err).NotTo(HaveOccurred(), "Webhook server not responding on port 443") | ||
| } | ||
| Eventually(verifyWebhookServiceReady, 2*time.Minute).Should(Succeed()) | ||
| // +kubebuilder:scaffold:e2e-webhooks-readiness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may can simplify with
By("waiting for webhook service to be ready if webhooks are configured")
verifyWebhookServiceReady := func(g Gomega) {
const webhookServiceName = "{{ProjectName}}-webhook-service"
cmd := exec.Command("kubectl", "get", "service", webhookServiceName, "-n", namespace)
_, err := utils.Run(cmd)
if err != nil {
// Project has no webhook service; nothing to wait on.
return
}
cmd = exec.Command("kubectl", "get", "pods", "-l", "control-plane=controller-manager",
"-n", namespace, "-o", "jsonpath={.items[0].status.conditions[?(@.type=='Ready')].status}")
output, err := utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred())
g.Expect(output).To(Equal("True"),
"Controller manager pod not ready (webhook server may not be accepting connections)")
cmd = exec.Command("kubectl", "get", "endpoints", webhookServiceName,
"-n", namespace, "-o", "jsonpath={.subsets[*].addresses[*].ip}")
output, err = utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred())
g.Expect(output).NotTo(BeEmpty(), "Webhook service endpoints are not ready")
cmd = exec.Command("kubectl", "get", "--raw",
fmt.Sprintf("/api/v1/namespaces/%s/services/https:%[2]s:https/proxy/readyz", namespace, webhookServiceName))
_, err = utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Webhook server not responding on port 443")
}
Eventually(verifyWebhookServiceReady, 2*time.Minute).Should(Succeed())There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you, this looks Perfect :)
Regarding this I have slightly modified. Since now we have the marker which runs exclusively for webhooks, instead of ignoring the error, will be failing if service is not found.
const webhookServiceName = "{{ProjectName}}-webhook-service"
cmd := exec.Command("kubectl", "get", "service", webhookServiceName, "-n", namespace)
_, err := utils.Run(cmd)
if err != nil {
// Project has no webhook service; nothing to wait on.
return
}
to
const webhookServiceName = "{{.ProjectName}}-webhook-service"
// Webhook service should exist since webhooks are configured
cmd := exec.Command("kubectl", "get", "service", webhookServiceName, "-n", namespace)
_, err := utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Webhook service should exist but was not found")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this
cmd = exec.Command("kubectl", "get", "--raw",
fmt.Sprintf("/api/v1/namespaces/%s/services/https:%[2]s:https/proxy/readyz", namespace, webhookServiceName))
_, err = utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Webhook server not responding on port 443")
Looks like it is failing after using the service proxy to check for readiness here: https://github.com/mayuka-c/kubebuilder/actions/runs/18962257859/job/54151953303.
I will see on what could be the reason for this. Please do let me know if the endpoint used is wrong. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use a pod with curl to do so is fine but we need to cleanup
That is why I thought in some other alternative option like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I reverted to use curl, it does remove the pod once it finishes with --rm flag. Please do let me know if it is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure that all that we create is cleaned.
Use -rm might be a good option if the pod is only removed after the check.
It seems it might be a good approach I will check it out.
Thank you a lot for looking on that 🚀
Well done for sure.
05fc492 to
014349e
Compare
Fix flakes - 2 Run make generate RUn make generate Minor fix Minor fix Fix-2 Fix-3 Implement a new marker for webhook readiness Lint fix scaffold md fix Address comment-2 Minor fix Test-1 Fix-2 Add error debug print Revert change
camilamacedo86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mayuka-c
That is great. But I think we could improve for
By("waiting for webhook service to be ready")
verifyWebhookServiceReady := func(g Gomega) {
const webhookServiceName = "project-v4-multigroup-webhook-service"
// 1) Ensure the webhook Service exists
cmd := exec.Command("kubectl", "get", "service", webhookServiceName, "-n", namespace)
_, err := utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Webhook service should exist but was not found")
// 2) Wait until the controller-manager Pod is Ready
cmd = exec.Command(
"kubectl", "wait", "pod",
"-l", "control-plane=controller-manager",
"-n", namespace,
"--for=condition=Ready",
"--timeout=90s",
)
_, err = utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Controller manager pod not ready (webhook server may not be accepting connections)")
// 3) Verify webhook Service endpoints are ready using EndpointSlice (modern clusters)
cmd = exec.Command(
"kubectl", "get", "endpointslices",
"-l", "kubernetes.io/service-name="+webhookServiceName,
"-n", namespace,
"-o", "jsonpath={.items[*].endpoints[?(@.conditions.ready==true)].addresses[*]}",
)
output, err := utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Failed to query EndpointSlices for the webhook service")
g.Expect(strings.TrimSpace(output)).NotTo(BeEmpty(),
"Webhook service has no ready endpoints (EndpointSlice has zero ready addresses)")
// 4) Test webhook connectivity using an ephemeral curl pod in the same namespace
cmd = exec.Command(
"kubectl", "run", "webhook-test",
"--rm", "-i", "--restart=Never",
"-n", namespace, // run the curl pod in the test namespace
"--image=curlimages/curl:latest", "--",
"curl", "-ksSf", "--connect-timeout", "5",
"--retry", "3", "--retry-delay", "1",
"https://"+webhookServiceName+"."+namespace+".svc:443/readyz",
)
_, err = utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(),
"Webhook server not responding or /readyz endpoint unhealthy on port 443")
}
Eventually(verifyWebhookServiceReady, 2*time.Minute).Should(Succeed())
// +kubebuilder:scaffold:e2e-webhooks-readiness
Then, following some questions:
- Do we need to keep
// +kubebuilder:scaffold:e2e-webhooks-readinessshould not that be replaced by the content?
| g.Expect(err).NotTo(HaveOccurred(), "Webhook server not responding on port 443") | ||
| } | ||
| Eventually(verifyWebhookServiceReady, 2*time.Minute).Should(Succeed()) | ||
| // +kubebuilder:scaffold:e2e-webhooks-readiness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure that all that we create is cleaned.
Use -rm might be a good option if the pod is only removed after the check.
It seems it might be a good approach I will check it out.
Thank you a lot for looking on that 🚀
Well done for sure.
| "curl", "-k", "--connect-timeout", "5", | ||
| "https://"+webhookServiceName+"."+namespace+".svc:443/readyz") | ||
| _, err = utils.Run(cmd) | ||
| g.Expect(err).NotTo(HaveOccurred(), "Webhook server not responding on port 443") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-n namespace→ ensures the curl pod runs and cleans up in the test namespace.-ksSf→ curl fails if it gets a non-2xx HTTP response, is silent otherwise, and ignores TLS verification errors.--retry 3 --retry-delay 1→ helps handle transient webhook startup delays.Improved error message→ clearer context if the readiness check fails.
cmd = exec.Command(
"kubectl", "run", "webhook-test",
"--rm", "-i", "--restart=Never",
"-n", namespace, // ensure the pod runs in the same namespace
"--image=curlimages/curl:latest", "--",
"curl", "-ksSf", "--connect-timeout", "5",
"--retry", "3", "--retry-delay", "1",
"https://"+webhookServiceName+"."+namespace+".svc:443/readyz",
)
_, err = utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Webhook server not responding or /readyz endpoint unhealthy on port 443")
| "-n", namespace, "-o", "jsonpath={.subsets[*].addresses[*].ip}") | ||
| output, err = utils.Run(cmd) | ||
| g.Expect(err).NotTo(HaveOccurred()) | ||
| g.Expect(output).NotTo(BeEmpty(), "Webhook service endpoints are not ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid use endpoints.
It seems deprecated
cmd = exec.Command(
"kubectl", "get", "endpointslices",
"-l", "kubernetes.io/service-name="+webhookServiceName,
"-n", namespace,
"-o", "jsonpath={.items[*].endpoints[?(@.conditions.ready==true)].addresses[*]}",
)
output, err := utils.Run(cmd)
g.Expect(err).NotTo(HaveOccurred(), "Failed to query EndpointSlices for the webhook service")
g.Expect(strings.TrimSpace(output)).NotTo(BeEmpty(),
"Webhook service has no ready endpoints (EndpointSlice has zero ready addresses)")
| output, err := utils.Run(cmd) | ||
| g.Expect(err).NotTo(HaveOccurred()) | ||
| g.Expect(output).To(Equal("True"), | ||
| "Controller manager pod not ready (webhook server may not be accepting connections)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use --for=condition=Ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds webhook readiness checks to e2e tests to ensure webhook services are fully operational before proceeding with metrics collection. This helps prevent flaky test failures caused by webhook services not being ready when subsequent test operations are executed.
- Adds a new
e2e-webhooks-readinessscaffold marker for injecting webhook readiness verification - Implements comprehensive webhook readiness checks including service existence, pod readiness, endpoint availability, and connectivity tests
- Updates documentation to describe the new scaffold marker
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/plugins/golang/v4/scaffolds/internal/templates/test/e2e/test.go | Adds webhook readiness check template and logic to inject it at the new marker location |
| docs/book/src/reference/markers/scaffold.md | Documents the new e2e-webhooks-readiness scaffold marker |
| testdata/project-v4/test/e2e/e2e_test.go | Adds webhook readiness verification before metrics collection |
| testdata/project-v4-with-plugins/test/e2e/e2e_test.go | Adds webhook readiness verification before metrics collection |
| testdata/project-v4-multigroup/test/e2e/e2e_test.go | Adds webhook readiness verification before metrics collection |
| docs/book/src/multiversion-tutorial/testdata/project/test/e2e/e2e_test.go | Adds webhook readiness verification before metrics collection |
| docs/book/src/getting-started/testdata/project/test/e2e/e2e_test.go | Adds scaffold marker placeholder for webhook readiness checks |
| docs/book/src/cronjob-tutorial/testdata/project/test/e2e/e2e_test.go | Adds webhook readiness verification before metrics collection |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "-n", namespace, "-o", "jsonpath={.items[0].status.conditions[?(@.type=='Ready')].status}") | ||
| output, err := utils.Run(cmd) | ||
| g.Expect(err).NotTo(HaveOccurred()) | ||
| g.Expect(output).To(Equal("True"), |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace detected at the end of this line. This should be removed for consistency with code style standards.
| g.Expect(output).To(Equal("True"), | |
| g.Expect(output).To(Equal("True"), |
| "Controller manager pod not ready (webhook server may not be accepting connections)") | ||
| // Check if webhook service endpoints are available | ||
| cmd = exec.Command("kubectl", "get", "endpoints", webhookServiceName, |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace detected at the end of this line. This should be removed for consistency with code style standards.
| cmd = exec.Command("kubectl", "get", "endpoints", webhookServiceName, | |
| cmd = exec.Command("kubectl", "get", "endpoints", webhookServiceName, |
| // Test webhook connectivity by checking if webhook server port is responding | ||
| cmd = exec.Command("kubectl", "run", "webhook-test", "--rm", "-i", "--restart=Never", | ||
| "--image=curlimages/curl:latest", "--", | ||
| "curl", "-k", "--connect-timeout", "5", |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace detected at the end of this line. This should be removed for consistency with code style standards.
| "curl", "-k", "--connect-timeout", "5", | |
| "curl", "-k", "--connect-timeout", "5", |
| cmd = exec.Command("kubectl", "run", "webhook-test", "--rm", "-i", "--restart=Never", | ||
| "--image=curlimages/curl:latest", "--", |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded pod name "webhook-test" could cause conflicts when Eventually retries this function. If a previous attempt fails after the pod is created but before it's deleted (e.g., due to curl failure), subsequent attempts will fail with "pod already exists" errors. Consider using a unique name with a timestamp suffix or using --generate-name flag instead.
| cmd = exec.Command("kubectl", "run", "webhook-test", "--rm", "-i", "--restart=Never", | |
| "--image=curlimages/curl:latest", "--", | |
| cmd = exec.Command("kubectl", "run", "--rm", "-i", "--restart=Never", | |
| "--generate-name=webhook-test-", "--image=curlimages/curl:latest", "--", |
Fixes: #5137
By looking at the logs, looks like during the curl metrics pod creation, calls to the webhook is failing due to connection refused. So added precheck to ensure the webhook is ready before beginning with the pod creation.
Validation
Ran the e2e tests multiple times here and did not see the flakes: https://github.com/mayuka-c/kubebuilder/actions/runs/18582517449